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Approximately Counting Triangles in Sublinear Time 


Talya Eden* Amit Levi^ Dana RoT C. Seshadhri 


Abstract 

We consider the problem of estimating the number of triangles in a graph. This problem 
has been extensively studied in both theory and practice, but all existing algorithms read the 
entire graph. In this work we design a sublinear-time algorithm for approximating the number 
of triangles in a graph, where the algorithm is given query access to the graph. The allowed 
queries are degree queries, vertex-pair queries and neighbor queries. 

We show that for any given approximation parameter 0 < e < 1 , the algorithm provides 
an estimate t such that with high constant probability, (1 — e)-t<t< (l-|-e)-t, where t is 
the number of triangles in the graph G. The expected query complexity of the algorithm is 
+ ruin 1/e)) where n is the number of vertices in the graph and 

m is the number of edges, and the expected running time is 1 /e)- We 

also prove that ^^7173 + min jm, A“}) queries are necessary, thus establishing that the query 

complexity of this algorithm is optimal up to polylogarithmic factors in n (and the dependence 
on 1/e). 
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1 Introduction 


Counting the number of triangles in a graph is a fundamental algorithmic problem. In the study 
of complex networks and massive real-world graphs, triangle counting is a key operation in graph 
analysis for bioinformatics, social networks, community analysis, and graph modeling [HL70, C 0 I 88 , 
PorOO, EM02, MSOI+02, Bur04, BBCG08, FWVDCIO, BHLPll, SKP12]. In the theoretical 
computer science community, the primary tool for counting the number of triangles is fast ma¬ 
trix multiplication [IR78, AYZ97, BPWZ14]. On the more applied side, there is a plethora of 
provable and practical algorithms that employ clever sampling methods for approximate trian¬ 
gle counting [CN85, SW05b, SW05a, Tso08, TKMF09, AvrIO, KMPT12, CCll, SVIl, TKMII, 
AKM13, SPK13, TPT13]. Triangle counting has also been a popular problem in the streaming 
setting [BYKS02, JG05, BFL+06, AGM12, KMSSI2, JSP13, PTTW13, TPT13, ADNKI4]. 

All these algorithms read the entire graph, which may be time consuming when the graph is 
very large. In this work, we focus on suhlinear algorithms for triangle counting. We assume the 
following query access to the graph, which is standard for sublinear algorithms that approximate 
graph parameters. The algorithm can make: (I) Degree queries, in which the algorithm can query 
the degree of any vertex v. (2) Neighbor queries, in which the algorithm can query what vertex 
is the neighbor of a vertex u, for any i < dy. (3) Vertex-pair queries, in which the algorithm 
can query for any pair of vertices v and u whether {u, v) is an edge. 

Gonen et al. [GRSII], who studied the problem of approximating the number of stars in a 
graph in sublinear time, also considered the problem of approximating the number of triangles in 
sublinear time. They proved that there is no sublinear approximation algorithm for the number 
of triangles when the algorithm is allowed to perform degree and neighbor queries (but not pair 
queries). ^ 

They asked whether a sublinear algorithm exists when allowed vertex-pair queries in addition 
to degree and neighbor queries. We show that this is indeed the case. 


1.1 Results 

Let G be a graph with n vertices, m edges, and t triangles. We describe an algorithm that, given 
an approximation parameter 0 < e < I and query access to G, outputs an estimate t, such that 
with high constant probability (over the randomness of the algorithm), {1 — e) ■ t < t < {1 + e) ■ t. 
The expected query complexity of the algorithm is 


• poly(logn, 1 /e) , 

( 3/2 \ 

) • poly (log n, 1/e). We show that this result is almost 
optimal by proving that the number of queries performed by any multiplicative-approximation 
algorithm for the number of triangles in a graph is 


n 


^ + mm < m, ■ 


m' 


3/2 


n 


( n 

^+mm 



^To be precise, they showed that there exist two families of graphs over m = 0(n) edges, such that all graphs in 
one family have 0(n) triangles, all graphs in the other family have no triangles, but in order to distinguish between 
a random graph in the first family and random graph in the second family, it is necessary to perform Q,(n) degree 
and neighbor queries. 
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1.2 Overview of the algorithm 

For the sake of clarity, we suppress any dependencies on the approximation parameter e and on 
logn using the notation 0 *(-)- 

1.2.1 A simple oracle-based procedure for a 1/3-estimate 

First, let us assume access to an oracle that, given a vertex v, returns tv, the number of triangles that 
V is incident to. Note that t = Ylv^v/^- unbiased estimate is obtained by sampling, uniformly 
at random, a (multi-)set 5 of s vertices, and outputting = | • f • this estimate can 

have extremely large variance (consider the “wheel” graph where there is one vertex with tv = 0(n) 
and all other tv's are constant). Inspired by work on estimating the average degree [FeiOG, GR08], 
we can reduce the variance by simply “cutting off” the contribution of vertices v for which tv is 
above a certain threshold. Call such vertices heavy, and denote the remaining light. If the threshold 
is set to then the number of heavy vertices is 0((et)^/^). This implies that the total 

number of triangles in which all three endpoints are heavy is 0{et). 

Hence, suppose we define tv to be ty if ty < and 0 otherwise, and consider ts = 

i ■ 7 ■ '^ves^'<^- argue that £[^5] G [(1/3 — ^)t,t], since (roughly speaking) every triangle 

that contains at least one light vertex is counted at least once. Since ty ranges between 0 and 
^2/3/gi/3, ]^y applying the multiplicative Chernoff bound, a sample of size s = O* is sufficient 

to ensure that with high constant probability ts is in the range [(| — 2e) ■ t, {1 + e) ■ t\. 

1.2.2 Assigning weights to triangles so as to improve the estimate 

To improve the approximation, we assign weights to triangles inversely proportional to the number 
of their light endpoints (rather than assigning a uniform weight of | as is done when defining 

If for each light vertex v we let wt(u) be the sum over the weights of all 
triangles that v participates in and for each heavy vertex v we let wt(u) = ty = 0, then the expected 
value of ^ wt('i;) is in [(1 - 0(e)) 0,(1 + 0(e)) • t]. 

To get rid of the fictitious oracle, we must resolve two issues. The first issue is efficiently deciding 
whether a vertex is heavy or light, and the second is approximating ^ assuming we 

have a procedure for deciding whether a vertex is heavy or light. We next discuss each of these two 
issues. For convenience, we will assume that the algorithm already has constant factor estimates 
for m and t. This can be removed by approximating m and performing a geometric search on t. 

1.2.3 Deciding whether a vertex is heavy 

Let u be a fixed vertex with degree dy. Consider an edge e incident to v, and let u be the other 
endpoint of this edge. Let tf, denote the number of triangles that e belongs to. Consider the random 
variable Y defined by selecting, uniformly at random, a neighbor w of u, and setting Y = if (u, w) 
is an edge (so that {v, u, w) is a triangle) and T = 0 otherwise. Since the number of neighbors 
of u that form a triangle with v is te, the expected value of T is ^ ■ dy = te- Now consider 
selecting (uniformly at random) several edges incident to v, denoted ei,... ,6^, and for each edge 
Cj selected, defining the corresponding random variable Yj. Then the expected value of ^ 
is ^ • X[e=(u u) = ^ • ty If we multiply by dy/2, then we get an unbiased estimator for ty, which 
in particular can indicate whether v is heavy or light. 

However, once again the difficulty is with the variance of this estimator and the implication 
on the complexity of the resulting decision procedure. To address these difficulties we modify 
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the procedure described above as follows. First, if is above a certain threshold, then v is also 
considered heavy (where this threshold is of order O , so that the total number of 

triangles in which all three endpoints are heavy remains 0{et). Second, observe that when trying 
to estimate the number of triangles that an edge ej = {v,Xj) participates in, we can either select 
a random neighbor w oi v and check whether {xj,w) G E, or we can select a random neighbor 
w of Xj and check whether (n, w) G E. Since it is advantageous for the sake of the complexity 
to consider the endpoint that has a smaller degree, we do the following. Each time we select an 
edge Bj = {v,Xj) incident to v, we let Uj be the endpoint of Cj that has smaller degree. If duj is 
relatively large (larger than y/rn), then we select k = \duj/y/Tn] neighbors of Uj and let Yj equal 
duj times the fraction among these neighbors that close a triangle with ej. The setting of k implies 
a bound on the variance of Yj (conditioned on the choice of ej), which is y/m times its expected 
value, tej . Third, in order to bound the variance due to the random choice of edges ej incident to 
n, we do the following. We assign each triangle that v participates in to a unique edge incident to 
V and modify the dehnition of te to be the number of such triangles that are assigned to e. The 
assignment is such that te is always upper bounded by 0{y/rn). Finally, we perform a standard 
median selection over O(logn) repetitions of the procedure. 

Our analysis shows that it suffices to set r (the number of random edges incident to v that are 
selected) to be O* so as to ensure the correctness of the procedure (with high probability). 

In the analysis of the expected query complexity and running time of the procedure we have to 
take into account the number of iterations k = \duj/y/^ for each selected (lower degree endpoint) 
Uj and argue that for every vertex v, the expected number of these iterations is a constant. 

1.2.4 Estimating wt(n) 

Suppose we have a (multi-)set S of vertices such that ^ • Z]^g5wt(n) is indeed in [(1 — 0(e)) • 
t, (1 + 0(e)) • t] (which we know occurs with high probability if we select s = vertices 

uniformly at random). Consider the set of edges incident to vertices in S, where we view edges as 
directed, so that if there is an edge between v and v' that both belong to S, then {v,v') and {v',v) 
are considered as two different edges. We denote this set of edges by Es, and their number by ds, 
where ds = Suppose that for each edge e = {v,x) we assign a weight wt(e), which is the 

sum of the weights of all triangles that v participates in and are assigned to e (where the weight 
of a triangle is as dehned previously based on the number of light endpoints that it has). Then 
= E^,esWt(^;). 

The next idea is to sample edges in Es uniformly at random, and for each selected edge e = {v, u) 
to estimate wt(e). An important observation is that since we can query the degrees of all vertices 
in S, we can efficiently select uniform random edges in Es (as opposed to the more difficult task 
of selecting random edges from the entire graph). Similarly to what was described in the decision 
procedure for heavy vertices, given an edge e G Es we let u be its endpoint that has smaller degree. 
We then select \y/rn/du\ random neighbors of u and for each check whether it closes a triangle with 
e. For each triangle found that is assigned to e, we check how many heavy endpoints it has (using 
the aforementioned procedure for detecting heavy vertices) so as to compute the weight of the 
triangle. In this manner we can obtain random variables whose expected value is ^ 
and whose variance is not too large (upper bounded by y/m times this expected value). We can 
now take an average over sufficiently many (O* ^) such random variables and multiply by 

ds ■ n. By upper bounding the probability that ds is much larger than its expected value we can 
prove that the output of the algorithm is as desired. The expected query complexity and running 
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( 3/2 \ 

^— ) ■ 

3/2 3/2 

Finally we note that if t < so that — > m, then we can replace ^— with m in the 

upper bound on the query complexity since we can store all queried edges so that no edge needs to 
be queried more than twice (once from each endpoint). 


1.3 A high level discussion of the lower bound 

Proving that every multiplicative-approximation algorithm must perform queries is fairly 

straightforward, and our main focus is on proving that ^min |m,queries are necessary 
as well. In order to prove this claim we define, for every n, every 1 < m < (2) and every 
1 < t < min{(3),m^/^}, a graph Gi and a family of graphs Q 2 for which the following holds: 
(1) The graph Gi and all the graphs in Q 2 have n vertices and m edges. (2) In Gi there are no 
triangles, while the number of triangles in each graph G G ^2 is ©(t)- We prove that for values of t 
such that t > y/m, at least queries are required in order to distinguish with high constant 

probability between Gi and a random graph in ^2- We then prove that for values of t such that 
t < y/m, at least ll(m) queries are required for this task. We give three different constructions for 
Gi and Q 2 depending on the value of t as a function of m (where two of the constructions are for 
subcases of the case that t > y/m). For further discussion of the lower bound, see Section 4. 


1.4 Related Work 

1.4.1 Approximating graph parameters in sublinear time 


We build on previous work on approximating the average degree of a graph and the number of 
stars [Fei06, GR08, GRSll]. Feige [FeiOG] investigated the problem of estimating the average de¬ 
gree of a graph, denoted d, when given query access to the degrees of the vertices. By performing 

a careful variance analysis, Feige proved that Oi Jn/d/e ) queries are sufficient in order to ob¬ 


tain a (^ — e)-approximation of d. He also proved that a better approximation ratio cannot be 
achieved in sublinear time using only degree queries. The same problem was considered by Gol- 
dreich and Ron [GR08]. Goldreich and Ron proved that a (1 -|- e)-approximation can be achieved 

with O n/V(i^ ■ poly (log n, 1/e) queries, if neighbor queries are also allowed. 

Building on these ideas, Gonen et al. [GRSll] considered the problem of approximating the 
number of s-stars in a graph. Their algorithm only used neighbor and degree queries. A major 
difference between stars and triangles is that the former are non-induced subgraphs, while the 
latter are. Additional work on sublinear algorithms for estimating other graph parameters include 
those for approximating the size of the minimum weight spanning tree [GRT05, GS09, CEF^05], 
maximum matching [NO08, YYI09] and of the minimum vertex cover [PR07, NO08, MR09, YYI09, 
HKNO09, ORRR12]. 


1.4.2 Triangle counting 

Triangle counting has a rich history. A classic result of Itai and Rodeh showed that triangles 
can be enumerated in 0(m^/^) time, and a more elegant algorithm was given by Chiba and 
Nishizeki [CN85]. The connections to matrix multiplication have been exploited for faster the¬ 
oretical algorithms [IR78, AYZ97, BPWZ14]. In practice, there is a diverse body on work on 
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counting triangles using different techniques, for different models. There are serial algorithms based 
on eigenvalue methods [Tso08, AvrlO], graph sparsihcation [TDM+09, KMPT12, TKMll, PT12], 
and sampling paths [SW05b, SPK13]. Triangle counters have been given for MapReduce [Coh09, 
SVll, KPP^IS]; external memory models [CCll]; distributed settings [AKM13]; semi-streaming 
models [BBCG08, KMPT12]; one-pass streaming [BYKS02, JG05, BFL'^06, AGM12, KMSS12, 
JSP13, PTTW13, TPT13, ADNK14]. It is worth noting that across the board, all these algorithms 
required reading the entire graph. 

Most relevant to our work are various sampling algorithms, that set up a random variable whose 
expectation is directly related to the triangle count [SW05b, KMPT12, JG05, BFL'^06, SPK13, 
JSP13, PTTW13, TPT13, ADNK14]. Typically, this involves sampling some set of vertices or edges 
to get a set of three vertices. The algorithm checks whether the sampled set induces a triangle, and 
uses the probability of success to estimate the triangle count. We follow the basic same philosophy. 
But it is significantly more challenging to set up the “right” random experiment, since we cannot 
read the entire graph. 


2 Preliminaries 


Let G = {V,E) be a simple graph with |P| = n vertices and \E\ = m edges. For a vertex u G R, 
we denote by dy the degree of the vertex, by F^ the set of u’s neighbors, and by Ey the set of edges 
incident to v. We denote by Ty the set of triangles incident to the vertex v, and let ty = \Ty\. 
Similarly, the set of triangles in the graph G is denoted by T, and the number of triangles in the 
graph in denote by t. We use c,ci,... to denote sufficiently large constants. 

We consider algorithms that can sample uniformly in V and perform three types of queries: 

1. Degree queries, in which the algorithm may query for the degree dy of any vertex v of its 
choice. 

2. Neighbor queries, in which the algorithm may query for the neighbor of any vertex v of 
its choice, i > dy, then a special symbol (e.g. f) is returned. No assumption is made on 
the order of the neighbors of any vertex. 

3. Pair queries, in which the algorithm may ask if there is an edge (n, v) £ E between any pair 
of vertices u and v. 


We sometimes use set notations for operations on multisets. We use the notation 0*{-) to suppress 
dependencies on the approximation parameter e or on log n. 

We use the following variant of the multiplicative Ghernoff bound. Let xi,..., be r indepen¬ 
dent random variables, such that Xi ^ [Oj B] for some B > 0 and E[xj] = b for every 1 < i < r. For 
every 7 G (0,1] the following holds: 


Pr 


(1 + 7)^ 


< exp 



( 1 ) 


and 


Pr 


'^Xi< ( 1 - 7 )^ 


i=l 


< exp — 


■ b ■ r 

2B 


( 2 ) 


We will also make an extensive use of Ghebyshev’s inequality: For a random variable X and for 
7 > 0 , 

Var[X] 


Pr[lA-E[X]|> 7 ]< 


T 
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We fix a total order on vertices denoted by as follows: u < v du < ox du = d^ and u < v 
(in terms of id number). Given u and v, two degree queries suffice to decide their ordering. 

Claim 1. Fix any vertex v. The number of neighbors w of v such that v -< w is at most y/2m. 

Proof. Let S = {w\w G -< w}. Naturally, dy > |5|. By definition of Vw G S, dyy > dy > l^l. 
Thus, > I'S'P and IS"! < y/2m. □ 


3 The Algorithm 

We start by introducing the notions of heavy and light vertices and how they can be utilized 
in the context of estimating the number of triangles. We then give a procedure for deciding 
(approximately) whether a vertex is heavy or light. Using this procedure we give an algorithm for 
estimating the number of triangles based on the following assumption (which is later removed). 

Assumption 1. Our initial algorithm takes as input estimates t and m on the number of edges 
and triangles in the graph respectively, such that 

1. t/A<t< t. 

2. mjQ < m. 

Assumption 1 can be easily removed by performing a geometric search on t and using the 
algorithm from [FeiOG] to approximate m, as explained precisely in the proof of Theorem 13. 

For every vertex v, we view the set of edges Ey as directed edges originating from v. We then 
associate each triangle {v,x,w) G Ty with a unique edge e G Ey, as dehned next. 

Definition 1. We say that a triangle {v,x,w) G Ty is associated with the directed edge {v, x) if 
X -< w, and to {v, w] otherwise. For a directed edge = {v, x] we let T-^ denote the set of triangles 
that it is associated with, that is, the set of triangles {v,x,w) such that x <w. 

Since it will always be clear from the context from which vertex an edge we consider originates, 
for the sake of succinctness, we drop the directed notation and use the notation Tg. We let te = \Te\, 
and for a hxed vertex v, we get ty = ^ te. 

In all the follows we assume that e < 1/2, and otherwise we run the algorithm with e = 1/2. 


2t2/3 

> . If V is such that 


3.1 Heavy and light vertices 

Definition 2. We say that a vertex v is heavy if dy > ^ or^fty 

2_ t2/3 

dy < ty < ^^ 173 ; then we say that v is light. 

We shall say that a partition {H, L) of V is appropriate (with respect to rh and t) if every heavy 
vertex belongs to H and every light vertex belongs to L. 


Note that for an appropriate partition (H, L) both H and L may contain vertices that are 
neither heavy nor light (but no light vertex belongs to H and no heavy vertex belongs to L). 

For a fixed partition [H, L) we associate with each triangle A a weight depending on the number 
of its endpoints that belong to L. 
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Definition 3. For a triangle A we define its weight wti(A) to be 


WtL{A) 


0 if no endpoints of A belong to L 

If I if A. has f > 0 endpoints that belong to L . 


Whenever it is clear for the context, we drop the subscript L and use the notation wt(-) instead 
of wtL(-)- 

Claim 2. If {H, L) is appropriate and Assumption 1 holds, then the number of triangles with weight 
0 is at most ch • et for some constant ch- 

Proof. By Assumption 1, the number of vertices v such that (i„ is greater than (2m/is at 
most 2m/(2m/(et)^/^) < and the number of vertices v such that ty > is at most 

3t/{2t^^^< 6(et)^/^. Therefore, there are at most ^ ) < 2000et triangles with all three 

endpoints in H. Setting ch = 2000 completes the proof. □ 


Definition 4. For any set T of triangles we define wt[T) = ^ wt{A). For a vertex v & L we 

AeT 

define wt{v) = Y1 wfiA), and wt{v) = 0 for v G H. 

Ae.T„ 

Lemma 3. For any partition {H,L), ^ wt{v) < t. If {II,L) is appropriate and Assumption 1 

v&L 

holds, then ^ wt{v) G [t{l — ch • e),t]. 

v^L 

Proof. Let xi'^^ be an indicator variable such that x(u, A) = 1 if A contains the vertex v, and 
xiv, A) = 0 otherwise. Consider a triangle A that contains i > 0 light vertices. Then 

^x(u,A) =£ = l/wt(A) . 

vGL 


If £ = wt(A) = 0, then the above expression equals 0. By interchanging summations, 

^wt(u) = '^Y1 = * - |{^ I wt(A) = 0}|. 

veL v&L AeT„ AeT v&L 


Clearly for any partition [H, L) the above expression is at most t. On the other hand, If [H, L) is 
appropriate and Assumption 1 holds, then by Claim 2 we have that |{A | wt(A) = 0}| < ch ■ ct, 
and the lemma follows. □ 

Theorem 4. Let s = (clog(n/e)/e^)n/t^^^ where c is a constant, and let S be a sample of s vertices 
vi,V 2 ,... ,Vs that are selected uniformly, independently at random. Then 


E 


- ^ Wt{Vi) 

s _ 



Furthermore, if {H, L) is appropriate and Assumption 1 holds, then 


E 


- E 


s 

i=l 


G [t(l - CH • e)ln,tln] 


and for a sufficiently large constant c. 


Pr 


- wtivi) < t(l — 2ch ■ ^)/'n 


i=l 


< f? fn 








Proof. Let Y denote the random variable 7 X] wt(t;j). By the first part of Lemma 3, 


2=1 


E 


< t/n. Now assume that {H,L) is appropriate and Assumption 1 holds. The 


i E Wt(Ui) 

2 = 1 J 

claim regarding the expected value of Y follows from the second part of Lemma 3, so it remains to 
prove the claim regarding the deviation from the expected value. Note that wt(u) < for every 
vertex v, which for u E L is at most ■ By the multiplicative Chernoff bound and by Item 1 in 
Assumption 1, 


Pr[y < (l-e)E[y]] <exp - 


€'^E[Y]s 

4^2737^73 


< exp — 


■ clog{n/e){n/■ t/{2n) 


4f/3/gl/3 


< 


n 


where the last inequality holds for a sufficiently large constant c. 


□ 


3.2 A procedure for deciding whether a vertex is heavy 

In this subsection we provide a procedure for deciding (approximately) whether a given vertex v is 
heavy or light. Recall that a high-level description of the procedure appears in Subsection 1.2.3 of 
the introduction. 

Heavy(u) 

1. If dy > output heavy. 

2. For i = 1, 2,... , 10 log n: 

(a) For j = 1, 2,..., s = 20m^/^/e^t: 

i. Select an edge e G Ey uniformly, independently and at random, and let u 
be the smaller endpoint according to the order 

ii. For A; = 1, 2,... , r = \du/V^] ■ 

A. Pick a neighbor w oiu uniformly at random. Let x denote the endpoint 
of e that is not v. 

B. If e and w form a triangle and x ^ w, set = dy, else = 0. 
hi. Set Yj = 

k 

(b) Set = 

j 

3. If the median of the Xi variables is greater than output heavy, else output 

light. 

We have three nested loops, with loop variables i,j,k respectively. We refer to these as “iteration 
i”, “iteration j”, and “iteration A;”. 

Lemma 5. For any iteration i, Pr[|Aj — t.i;| > e • max(t^, < 1/4. 

Proof. Recall that we associate each triangle (u, x, w) E Ty with {v, x) x < w and with (u, w) 
otherwise, so that we have ty = EeeE„^e- For an edge e = {v,x), te is upper bounded by the 
number of neighbors w of x such that x -< w. By Claim 1, te < y/2m. 

Fix an iteration j and let Cj denote the edge chosen in the iteration and Uj denote its smaller 
degree endpoint. We use £j to denote the event of ej being chosen. Conditioned on the event £j, 
the probability that is non-zero is is te^/duy Hence, 

E[Zk I £j] = ^ • du, = tey 
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and 


Var[Zfc I £jj < E[Z| | < duj ■ E[Zfc | £j 


By linearity of expectation, 

E[Yj I £j] = E 


^ \ Sj — ^ E [Zk I £j] — te 


r 

L k=l 


( 3 ) 


k=l 


By the independence of the variables, 

1 


Var[yj I £j] = Var 


d^i 


-^fc I £3 


r 

L fc=i 


n • te,' < Vm ■ tp 


V V 

^ ^ Var [Zfc I £’j] < ^ ^ duj ■ E [Zj. \ £j 


k=l 


k=l 


The conditioning can be removed to yield 

= E ^ ■ EK I £il = X ■ E 


e^Ev 


dij 


te = 


€.^Ev 


di) 


( 4 ) 


( 5 ) 


By the law of total variance, the law of total expectation, the bounds ■ < \/2m and m < 6m, and 
by Equations (3) and (4): 

Var[y,] = Ee, [Var [y, | T,]] + Vare, [E [y, | T,]] 


< Ee, 


• E [y,- I Tj] + Vare^. [te 


= V^-E[y,] + Ee,[ty 


= vm 


^ ej^Ev 

< * E [Yj] + ■ E[Yj] < 5v^E[yj] . 

Let y = 4^ Y^'. By Equation (5), E[y] = t^/d^. By Equation (6), 


( 6 ) 


Var[y] = Var 


E>i 

i=i 


4 Var(i,1 < 1 5^^. E [Fj] = ^ . E 


i=i 


i=i 


E^. 

i=i 


5\/m ^ ™ _ by/m - {t^/d^) _ ^ /_ 

s 20 • m^/^/e^t A dy m 


( 7 ) 


By Chebyshev’s inequality and Equation (7), 

t. 


Pr 


y- 


d,j 


> e ■ max 


ty t 
dy ’ fn 


^.2 


Var[y] 


< 


e'^ max{ty /dy, t/m/ 4 


Since Xi = dy ■ Y, we have that Pr[|Vi — t„| > e • max(t,;, td,;/m)] < 1/4. 


□ 


Lemma 6. For every vertex v, ifv is heavy, then a call to Heavy (T/ returns heavy with probability 
at least 1 —1/n^. Ifvis light, then a call to Heavy/n/ returns light with probability at least 1 —1/n^. 
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Proof. First consider a heavy vertex v. Clearly, if dy > 2m/{et)^^^, then v is declared heavy. 
Therefore, assume that ty > and dy < 2m/(et)^/^, so that tdy/m < 2t^^^ and hence 

max{tdy/m,ty) = ty. By LemmaS, and since e << 1/2, for any iteration i, Pr [|Xj — ty\ > ety] < 
1/4. Therefore, Pr[Xj < <1/4, and by Chernoff, the probability that the median of the 

Xi variables (where i = 1,..., lOlogn) will be greater than is at least 1 — 1/n^. Hence 

Heavy(r>) outputs heavy with probability at least 1 — 1/n^. 

Now consider a light vertex v. Since dy < 2m/(et)^/^ and ty < t^^^/2e^/^, it holds that tdy/m < 
2t^'^^/e^/^, implying that max{tdy/fn,ty) = ty < 2^^'^ Therefore, by LemmaS, Pr[|Xj — > 

e(2t^^^/e^/^)] < 1/4, and the probability that the median will be less than is at least 

1 — Xjr?. Hence v is declared light with probability at least 1 — 1/n^. □ 

The following is a corollary of Lemma 6. 

Corollary 7. Consider running Heavy for all the vertices in the graph. Let H denote the set of 
vertices that are declared heavy and let L denote the set of vertices that are declared light. Then, 
with probability at least 1 — 1/n, the partition {H,L) is appropriate (as defined in Definition 2). 

We now turn to analyze the running time of Heavy. The proof will be similar to the complexity 
analysis of the exact triangle counter of Chiba and Nishizeki [CN85]. 

Lemma 8. If Item 2 in Assumption 1 holds, then for every vertex v the expected running time of 
Heavy(u) is /t). 

Proof. We first argue that the expected time to generate a single sample of Yj is 0(1). Our query 
model allows for selecting an edge in Ey uniformly at random by a single query. If dy < y/^, 
then the degree of the smaller endpoint for any e ^ Ey is at most VM- Hence a sample is clearly 
generated in 0(1) time. Suppose that dy > Vm. If an edge e = {v, u) is sampled, then the runtime 
is 0(1 + m.\n.{dy, dy)/ y/w) . Hence, the expected runtime to generate Yj is, up to constant factors, 
at most: 

1 

dy 

where the last inequality follows from Item 2 in Assumption 1 

By the above, each iteration of the ‘for’ loop in Step 2a takes 0(1) time in expectation. There¬ 
fore, together, all iterations of Step 2a take 0(m^/^/ (ei)) time in expectation, and since it is repeated 
O(logn) times, the expected running time of the procedure is (m^/^/t) • poly(logn, 1/e). □ 

3.3 Estimating the number of triangles given m and t 

We are now ready to present an algorithm Estimate-with-advice that takes m, t as input (“ad¬ 
vice”), and outputs an estimate of t. Later, we employ the the average degree approximation 
algorithm of Feige [FeiOG] and a geometric search to get the bonafide algorithm that estimates t 
without any initial estimates m and t. Recall that a high-level description of the procedure appears 
in Subsection 1.2.4 of the introduction. In what follows we rely on the following assumption. 

Assumption 2. We will assume that the random coins used by Heavy are fixed in advance, and 
that the partition {H, L) as defined in Corollary 7 is indeed appropriate. 

By Corollary 7 this assumption only adds 1/re to the error probability in all subsequent proba¬ 
bility bounds. Recall that we use c, ci,... to denote sufficiently large constants. 

Recall that ch is the constant defined in Claim 2. 


2 . 

Mer„ 


i -h 


/rre 
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Estimate-with-advice(m, t, e) 

1. Sample si = cie“^ log(n/e)(n/i^^^) vertices, uniformly, independently and at ran¬ 
dom. Denote the chosen multiset S. 

2. Set up a data structure to enable sampling vertices in S proportional to their degree. 

3. For i = 1,2,... ,S 2 = C 2 e“^(log^ n){fn^/‘^/t): 


4. 


(a) Sample v G S proportional to dv and sample e G Ey uniformly at random. Let 
u be the smaller endpoint according to the order Let x be the endpoint of 
e that is not v. 

(b) If dy < set r = 1 with probability dujV^ and set r = 0 otherwise. If 
dy > set r = \du/y/M\. 

(c) Repeat for j = 1, 2,... , r: 

i. Pick a neighbor w ol u uniformly at random. 

ii. If e and w do not form a triangle, set Zj = 0. 

hi. If e and w form a triangle and w < x, set Zj = 0. 

iv. If e and w form a triangle A and x w: call Heavy for all vertices in A, 

and let 




max 


{dy,Vm) • wt(A) 


if Heavy(u) returned heavy 
otherwise 


(d) Set Ri = i E (If ^ = 0, set = 0.) 


i=i 

Output X = iY. dr 

' v^S 


S2 

i:Yr 
2 = 1 


Theorem 9. For X as defined in Step 4 o/Estimate-with-advice, E[X] < t. Moreover, if{H,L) 
is appropriate and Assumption 1 holds, then E[A] G [t(l — ch • e),t] and Pr[A < t{l — 3 ch ■ e)] < 
3e/ log n. 

There are three “levels” of randomness. First is the choice of S, second is the choice of e 
(Step 3a), and finally the Zj^s. When analyzing the randomness in any level, we condition on the 
previous levels. Before proving the theorem, we present the following definition and claim. 

Definition 5. Let S be a multiset of si vertices. IFe say that S is good if E 'wt{v)/si > t{l — 

vGS 

2ch • ^)/n. We say that S is great if in addition to being good, dg = Y — •si(2m/n)(logn/e). 

veS 

Claim 10. Fix the choice of the set S, and let ds = Y dy For every i, E[yj | 5] = dg^ Y wt{v) 

ves ves 

and Var[yj | S] < ■ E[yj | S']. 

Proof. This is similar to the argument in Lemma 5. Let Vi be the chosen vertex in the iteration of 
the algorithm, and let e* be the chosen edge. We refer to this event by £i, and condition over the set 
S being chosen and the event Tj. Denote by Ui the lower degree endpoint of e^. If Heavy(ui)=heavy, 
then E[y) | S,£j\ = 0 and Var[yj | S,£i] = 0. If Heavy(uj)=light, then there are two possibilities. 
If dy^ < Vm then, 

E[yi I S,£i] = ^ -j- ■ 'Ym • wt(A) = wt{Tefi . 

AST,, 
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Since the maximum value of Yi in this case is at most y/m, 

Var[yi I S,£i] < | s,£i] <Vm-E[Yi \ S,£i] . 


( 8 ) 


Now consider the case that > y/m. In order to bound the variance of the Y^ variables we first 
analyze the expectation and variance of the Zj variables. Note that Zj is non-zero when a triangle 


A E Te- is found. It holds that 


1 


E[Zj \ S,£i]= ^ ^ • dui ■ wt(A) = wt{Te^), 

AeTe. 


and 


By linearity of expectation, 


Var[Z,- I S,£i]<durnZj \ S,£i 


E[Yi I S,£i]=wt{TeJ. 


(9) 


By independence of the {Zj \ S,£i) variables, linearity of expectation and Equation (9), 


Var[yi I S,£i] = Var 

d. 




i=i 


V r 

Var[Z, I S.Z.] < .E[Z, | S.Z,] 


^Ui 


E 






i=i 

<y/m-E[Yi I S,£i]. 


i=i 


( 10 ) 


We remove the conditioning on £i: 


E[yi \ S]= = ds^ Y Yl ^wt(u) . 

v&SnL ^ ^ eeE„ vGSnLeeEv V&S 

Recall that by Claim 1, wt(e) < y/2m. Therefore, by the law of total variance, the law of total 
expectation, the bound m < 6m, and Equations (8) and (10), 

Var[yi I 5] = Ee, [Var [y^ | S,£i]] + Vare, [E [y^ | S,£i]] 

/m-E[Yi\ S,£^]\ +EeJwt(Tej2] 

Vm • E[y I S] -t- V^Eei[wt{Tei)] < bVm • E[y | S'] . 

This completes the proof of Claim 10. □ 


< Efi, 


Proof of Theorem 9: For a fixed set S, let Xs denote the sum Xs = 


S1S2 


E 

vGS 


S2 

EE. 

2=1 


(as defined in Step 4 of Estimate-with-advice), given that the set S in chosen in Step 1. By the 
definition of Xg and by Claim 10, 


e[A 5] = ^E[y I s] = -Y^t(v). 

vGS 


( 11 ) 


By Theorem 4, E^ 


77 E 


veS 


E [t(l — ch • e)fn,tjn], implying that 
E[V5] £ [i(l - Ch ■ e),t]. 
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By Theorem 4, Definition 5 and Assumption 2, S is good with probability at least 1 — e^/n. The 
expected value, over S, of ds is [ds] = •si • By Markov’s inequality, 


Pr5 


ds > si- 


2m log n 


n 


< 


logn 


By taking a union bound, the probability that S is great is at least 1 — 2e/ logn. For a fixed choice 


S2 


of S, let T 5 = ^ Yi. By the independence of the Yi variables and by Claim 10, 


2 = 1 


Var [Ys] = 4 £ Var | 5] <4 f] | S]=^ • E [Tg] 

^2 -1 ’^2 -1 ^^2 

2=1 ^ 2=1 

By Chebyshev’s inequality, the setting of S 2 and Equation (12), we get that 


52 


Pr[|y5-E[y5]| >eE[ys]] < 


< 

,2 


5\/m • E[ys'] 


Varjyg] 

• E[y5]2 “ e2(c2e“^log^n)(m^/^/t) • E[y5]2 


C 2 (log^ n)(m/t) • E[y 5 ] 


( 12 ) 


By Claim 10, E[y 5 ] = dg ^2 wt(u), which for a great S is at least 

veS 

t{l-2 ch ■ e)/n)/si ^ t e 


si(2m/n)(e/log n) 4m logn 
Therefore, by Assumption 1, for a sufficiently large constant C 2 , 

Pr[|ys-E[y5]| >eE[y5]] 

log n 

By the definition of Xs in Step 4 of the algorithm, Xs is just a scaling of Ys- Therefore, 

Pr[|A5-E[A5]| >eE[A5]] < 


log n 

By Equation (11), E[A 5 ] = ^ X) wt(u), which for a great S is at least t(l — 2ch • e). Hence, for a 

^ ves 

great S, 


Pr [As < (1 - 3ch • e) • i] < 


logn 


The probability of S not being great is at most 2e/logn. We apply the union bound to remove the 
conditioning, so we get 

Pr [A < (1 - 3ch ■e)-t]< 


logn’ 


which completes the proof. 


□ 


Theorem 11. If Item 2 in Assumption 1 holds then the expected running time o/Estimate-with-advice 
is 0*{n/t2^^ Yrrc’l’^/t'). 
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_1 /o 

Proof. The sampling of S is done in 0*{nft ' ) time. Generating the Zj variables, without the 
calls to Heavy, takes time O* ft) in expectation, by an argument identical to that in the proof 
of Lemma 8. Therefore, it remains to bound the running time resulting from calls to Heavy. 

Let us compute the expected number of triangles found during the run of the algorithm. In each 
iteration i, conditioned on choosing an edge e, the expected number of triangles found is at most 
2{du/Vm){te/du) = 2te f'/m. Averaging over the edges, the expected number of triangles found in 
a single iteration is at most which by Item 2 in Assumption 1 is There are 

0(m^/‘^/t) ■ poly(logn, 1/e) iterations, leading to a total of 0*(1) expected triangles. Thus, there 
are 0*(1) expected calls to Heavy, each taking O* ft) time by Lemma 8. Together with the 
above, we get an expected running time of Ofnft^^^ + • poly(logre, 1/e). □ 

3.4 The final algorithm 

We are now ready to present an algorithm that requires no prior knowledge regarding m and t. 


Estimate(e) 

1. Let e' = e/ScH, where ch is the constant dehned in Claim 2. 

2. Invoke Feige’s algorithm [FeiOG] for approximating the average degree of a graph 
10 log re times. Let d be the median value of all invocations. 

3. Let rre = nd/2. 

4. Let t = re^. 

5. While t > 1 

(a) For t = re^,re^/2,re^/4, ... ,t: 

i. For i = 1,..., ce~^ log log re: 

A. Let Xi =Estimate-with-advice(e/m, t). 

ii. Let X = minj{Aj}. 
hi. If A > t return X. 

(b) Let t = t/2. 


Before analyzing the correctness and running time of the algorithm, we present the following 
simple proposition, whose proof we give for the sake of completeness. 

Proposition 12. For every graph G, t < . 

Proof. 

* = 24^ I < i j 2 Vm • 2reT, + 2v^ ^ J < ^rer^/^ 

v&V \ v: dv>^/m v: / \ v: dv^\/m J 

□ 

Theorem 13. Algorithm Estimate(e) returns a value X, such that (1 — e)t < X < 
(1 + e)t, with probability at least 5/6. The expected query complexity of the algorithm is 
O* (re/fi/^ + max {rre, rre^/^/t}) and the expected running time of the algorithm is 0*{n/t^^^ + 
m^/^/t). 

Proof. We first prove that the value of X is as stated in the theorem. Let davg denote the average 
degree of vertices in G. The algorithm from [FeiOG] returns a value d such that, with probability 
at least 2/3, d € [davg/{‘2' + 'y),davg] for a constant 7 . Since we take the median value of 10 log re 
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invocations, it follows from Chernoff’s inequality that m is as stated in Item 2 of Assumption 1 
with probability at least 1 — 1/poly(n). Assume that this is indeed the case. 

Before analyzing the algorithm Estimate as described above, first consider executing Step 5a 
with t = \. That is, rather than running both an outer loop over decreasing values of t and an 
inner loop over decreasing values of t, we only run a single loop over decreasing value of t, starting 
with t = n^. By the first part of Theorem 9 and by Markov’s inequality, for each value of t and for 
each i, Pr[Aj < (1 + e)t] > e/2, where Aj as defined in Step 5(a)iA. Therefore, for each value of t, 
the minimum estimate X (as defined in Step 5(a)iii) is at most (1 + e)t, with probability at least 
1 — 1/log^n. It follows that for each t such that t > 2t, we have that X <t with probability at 
least 1 — 1/log^n, and the algorithm will continue with t = t/2. Once we reach a value of t for 
which tj4: <t <tl2, Item 1 in Assumption 1, regarding t, holds. By the second part of Theorem 9, 
Xi G [(1 — e)t, (1 + e)t] for every i with probability at least 1 — c/logn. Hence, we have that 

t<^t<{l-e)t<X <{1 + e)t, 

with probability at least 1 — c/logn. Therefore, we halt and return correct X. 

If however we do reach a value t such that t < t/4, since Assumption 1 does not hold, we cannot 
lower bound X, implying that we can no longer bound the probability that X <t. Therefore we 
might continue running with decreasing values of t, causing the running time to exceed the desired 
bound of 0*(n/t^/^ + m^/^/t). In order to avoid this scenario, we run both an outer loop over t and 
an inner loop over t. Specifically, starting with t = n^, whenever we halve t, we run over all values 
of f = n^, n^/2,..., until we reach t. This implies that for every value of t > 2t the probability of 
returning an incorrect estimate, that is, outside the range of (1 — e)f < A < (1 + e)t, is at most 
1 — 1/log^n. On the other hand, for values of t such that t < t/2 the probability of returning a 
correct estimate (within (1 — e)f < A < (1 + e)t) is at least 1 — c/log re. A union bound over all 
failure probabilities gives a success probability of at least 5/6. 

We now turn to analyze the query complexity and running time of the algorithm. By [Fei06], 
the expected running time of the average degree approximation algorithm is 0*(re/-y/rre). By 
Theorem 11, conditioned on fn satisfying Item 2 in Assumption 1, the expected running time 
of Estimate-with-advice {€,m,t) is ^ w?!'^/t). It follows from Proposition 12 that 

n/y/m = 0{n/t^^^), implying that the running time is determined by the value of rre and by the 
smallest value of t that Estimate-with-advice(e, rre, f) is invoked with. 

Recall that whenever we halve the value of i, we run with all values t = re^,re^/2,.... This, 
together with the fact that when running with f/4 < t < t/2 we halt with probability at least 
1—c/ log re, implies that the probability of reaching a value i = t/2^ is at most (c/ log re)^. Therefore, 
the expected running time, conditioned on rre satisfying Item 2 in Assumption 1, is bounded by 

log n 

+ ^(c/logre)^-2^-0* 
k=l 



log re-O ^ + 


■3/2' 


Now consider the value of fre computed in Step 3 of Estimate(e). As stated previously, with 
probability at least 1 — 1/poly (re) (e.g., 1 — 1/re^), the estimate rre is within a constant factor from 
rre. Therefore the expected running time of the algorithm (without the conditioning on the value 
of rre) is bounded by 



+ T . 0(n=) = O- 



16 







Observe that we can always assume that the algorithm does not perform queries it can answer by 
itself. That is, we can allow the algorithm to save all the information it obtained from past queries, 
and assume it does not query for information it can deduce from its past queries. Further observe 
that any pair query is preceded by a neighbor query. Therefore, if at any point the algorithm 
performs more than 2m queries, it can abort. It follows that the expected query complexity is 


+ min{m, m^/^/t}). 


□ 


4 A Lower Bound 

In this section we present a lower bound on the number of queries necessary for estimating the 
number of triangles in a graph. Since we sometimes refer to the number of triangles in different 
graphs, we use the notation t{G) for the number of triangles in a graph G. Our lower bound matches 
our upper bound in terms of the dependence on n, m and t(G), up to poly logarithmic factors in 
n and the dependence in 1/e. In what follows, when we refer to approximation algorithms for the 
number of triangles in a graph, we mean multiplicative-approximation algorithms that output with 
high constant probability an estimation t such that t{G)/G < t < G ■ t{G) for some predetermined 
approximation factor G. 

We consider multiplicative-approximation algorithms that are allowed the following three types 
of queries: Degree queries, pair queries and random new-neighbor queries. Degree queries and pair 
queries are as defined in Section 2 . A random new-neighbor query qi is a single vertex u and the 
corresponding answer is a vertex v such that {u, v) £ E and the edge {u, v) is selected uniformly 
at random among the edges incident to u that have not yet been observed by the algorithm. In 
Corollary 34 we show that this implies a lower bound when the algorithm may perform (standard) 
neighbor queries instead of random new-neighbor queries. 

We first give a simple lower bound that depends on n and t{G). 

Theorem 14. Any multiplicative-approximation algorithm for the number of triangles in a graph 
must perform Q ^ ^ queries, where the allowed queries are degree queries, pair queries and 

random new-neighbor queries. 

Proof. For every n and every 1 < f < ( 3 ) we next define a graph Gi and a family of graphs Q 2 for 
which the following holds. The graph Gi is the empty graph over n vertices. In Q 2 , each graph 
consists of a clique of size and an independent set of size n — ■ See Figure 1 for an 

illustration. Within Q 2 the graphs differ only in the labeling of the vertices. By construction, Gi 
contains no triangles and each graph in Q 2 contains 0(t) triangles. Clearly, unless the algorithm 
“hits” a vertex in the clique it cannot distinguish between the two cases. The probability of hitting 
such a vertex in a graph selected uniformly at random from Q 2 is /n. Thus, in order for this 

event to occur with high constant probability, D (l^) Queries are necessary. □ 

We next state our main theorem. 

Theorem 15. Any multiplicative-approximation algorithm for the number of triangles in a graph 
must perform at least D ^min queries, where the allowed queries are degree queries, 

pair queries and random new-neighbor queries. 

For every n, every 1 < m < ( 2 ) and every 1 < t < min | ( 3 ), we define a graph Gi and 

a family of graphs Q 2 for which the following holds. The graph Gi and all the graphs in Q 2 have 
n vertices and m edges. For the graph Gi, t{Gi) = 0, and for every graph G € G 2 , t{G) = Q{t). 
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Figure 1: An illustration of the two families. 

We prove it is necessary to perform ^min | , 7T7-|^ queries in order to distinguish with high 

constant probability between Gi and a random graph in Q 2 . For the sake of simplicity, in everything 
that follows we assume that y/m is even. 

We prove that for values of t such that t < at least n(m) queries are required, and for 

values of t such that t > y/m at least Q queries are required. 

We delay the discussion on the former case to Subsection 4.4, and start with the case that 
t > y/m. Our construction of Q 2 depends on the value of t as a function of m where we deal 
separately with the following two ranges of t: 

1. t E [n(m), 

2. t E [Q{y/m,0{m)]. 

We prove that for every t as above, queries are needed in order to distinguish between 

the graph Gi and a random graph in Q 2 - Observe that by Proposition 12, for every graph G, it 
holds that t{G) = O Hence, the above ranges indeed cover all the possible values of t as a 

function of m. 

A high level discussion of the lower bound. The constructions for the different ranges of 
t > ^/rn are all based on the same basic idea, and have the following in common. In all construction 
for t as above, Gi consists of a complete bipartite graph {L U R, E) with \L\ = |i?| = ^/m and an 
independent set of n — 2y/rn vertices. The basic structure of the graphs in the family Q 2 is the same 
as that of Gi with the following modifications: 

• For every value of t, we add t/^pm edges between vertices in L (and similarly in K). Since 
each edge contributes (roughly) pm triangles, this gives the desired total number of triangles 
in the graph. In the case that t = m this is done by adding a perfect matching within L and 
a perfect matching within R. In the case that t > m we add several such perfect matchings, 
and in the case that pm < t < m/4 we add a (non-perfect) matching of size tjpm. 

• In order to maintain the degrees of all the vertices in the bipartite component, we remove 
edges between vertices in L and R. 

For an illustration of the case t = m, see Figure 2. In what follows we assume that the algorithm 
knows in advance which vertices are in L and which are in R, and consider only the bipartite 
component of the graphs. In order to give the intuition for the jt lower bound we consider 
each type of query separately, starting with degree queries. 
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Since both in the graph Gi and in all the graphs in Q 2 , all the vertices in L U i? have the 
same degree (of ^/m) , degree queries do not reveal any information that is useful for distinguishing 
between the two. 

As for pair queries, unless the algorithm queries a pair in L x L (or R x R) and receives a 
positive answer, or queries a pair in L x i? and receives a negative answer, the algorithm cannot 
distinguish between the bipartite component of the graph Gi and those of the graphs in ^ 2 - We 
refer to these pairs as witness pairs. Roughly speaking, since there are Q{t/y/rn) such pairs, and 
m pairs in total, it takes /t) queries in order to “catch a witness pair”. 

We are left to deal with neighbor queries. Here too, distinguishing between the graph Gi and 
the graphs in Q 2 can be done by “catching a witness”. That is, if the algorithm queries for a 
neighbor of a vertex in L and the answer is another vertex in L (analogously for a vertex in R). 
As before, the probability for hitting such a witness pair is small. However, there is another source 
of difference resulting from neighbor queries. When the algorithm queries a vertex u € L there is 
a difference in the conditional distribution on answers v £ R when the answer is according to the 
graph Gi or according to a graph in the family ^ 2 - The reason for the difference, is that in the 
graph Gi every vertex has exactly ^/rn neighbors in the opposite side, while for graphs in Q 2 , each 
vertex has Q(y/rn — t/m) neighbors in the opposite side (for the range < t < 0{m) this is 

true on average). We prove that this difference in sufficiently small so as to ensure the 
lower bound. 

Our formal analysis is based on defining two processes that interact with an algorithm for 
approximating the number of triangles, denoted ALG. The hrst process answer queries according 
to Gi, and the second process answers queries while constructing a uniformly selected graph in 
^ 2 - An interaction between ALG and each of these processes induces a distribution over sequences 
of queries and answers. We prove that if the number of queries performed by ALG is smaller 
than m^/^/(ct) for a sufficiently large constant c, then the statistical distance between the two 
distributions is a small constant. 

We start by addressing the case that t = m in Subsection 4.1, and deal with the case that 
m < t < in Subsection 4.2, and with the case that ^/m < t ^ in Subsection 4.3. 

Before embarking on the proof for t = m, we introduce the notion of a knowledge graph (as 
defined previously in e.g., [GR02]), which will be used in all lower bound proofs. Let ALG be an 
algorithm for approximating he number of triangles, which performs Q queries. Let qt denote its 
query and let at denote the corresponding answer. Then ALG is a (possibly probabilistic) mapping 
from query-answer histories it = {{qi,ai ),..., {qt, at)) to qt+i, for every t < Q, and to N for t = Q. 

We assume that the mapping determined by the algorithm is determined only on histories that 
are consistent with the graph Gi or one of the graphs in ^ 2 - Any query-answer history tt of length 
t can be used to define a knowledge graph at time t. Namely, the vertex set of consists 
of n vertices. For every new-neighbor query ut answered by vt for i < t, the knowledge graph 
contains the edge {ui,Vi), and similarly for every pair query {uj,Vj) that was answered by 1. In 
addition, for every pair query {ui,Vi) that is answered by 0, the knowledge graph maintains the 
information that {ui,Vi) is a non-edge. The above definition of the knowledge graph is a slight 
abuse of the notation of a graph since G^"' is a subgraph of the graph tested by the algorithm, but 
it also contains additional information regarding queried pairs that are not edges. For a vertex u, 
we denote its set of neighbors in the knowledge graph by r^”(u), and let d^^{u) = |r^”'(u)|. We 
denote by Ni^"'{u) the set of vertices v such that {u,v) is either an edge or a non-edge in G^"'. 
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4.1 A lower bound for t = m 

4.1.1 The lower-bound construction 

The graph Gi has two components. The first component is a complete bipartite graph with y/m 
vertices on each side, i.e, and the second component is an independent set of size n — 2y/m. 

We denote by L the set of vertices £i ,on the left-hand side of the bipartite component and 
by R the set of vertices ri,... on its right-hand side. The graphs in the family Q 2 have the 
same basic structure with a few modifications. We first choose for each graph a perfect matching 
between the two sides R and L and remove the edges in from the graph. We refer to the 
removed matching as the "red matching” and its pairs as "crossing non-edges” or "red pairs”. Now, 
we add two perfect matching from L to L and from R to R, denoted and respectively. 
We refer to these matchings as the blue matchings and their edges as “non-crossing edges” or “blue 
pairs”. Thus for each choice of three perfect matchings and as defined above, we 

have a corresponding graph in ^ 2 - 

Consider a graph G G ^ 2 - Clearly, every blue edge participate in y/m — 2 triangles. Since, every 
triangle in the graph contains exactly one blue edge, there are 2y/rn ■ [y/m — 2) = 0(m) triangles 
in G. 



Figure 2: An illustration of the family Q 2 for t = m. 


4.1.2 Definition of the processes Pi and P 2 

In what follows we describe two random processes. Pi and P 2 , which interact with an arbitrary 
algorithm ALG. The process Pi answers ALG’s queries consistently with Gi. The process P 2 
answers ALG’s queries while constructing a uniformly selected random graph from Q 2 . We assume 
without loss of generality that ALG does not ask queries whose answers can be derived from its 
knowledge graph, since such queries give it no new information. For example, ALG does not ask 
a pair query about a pair of vertices that are already known to be connected by an edge due to a 
neighbor query. Also, we assume ALG knows in advance which vertices belong to L and which to 
to P, so that ALG need not query vertices in the independent set. Since the graphs in Q 2 differ 
from Gi only in the edges of the subgraph induced by P U P, we think of Gi and graphs in Q 2 as 
consisting only of this subgraph. Finally, since in our constructions all the vertices in L U P have 
the same degree of y/m, we assume that no degree queries are performed. 

For every, Q, every t < Q and every query-answer history tt of length t — 1 the process Pi 
answers the query of the algorithm consistently with Gi. Namely: 

• For a pair query qt = {u,v) if the pair {u,v) is a crossing pair in Gi, then the process replies 
1, and otherwise it replies 0. 
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• For a random new-neighbor query qt = u the process answers with a random neighbor of u that 

has yet been observed by the algorithm. That is, for every vertex v such that v G r(n)\r^”'(tt) 
the process replies at = v with probability (u)). 

The process P 2 is defined as follows: 

• For a query-answer history vr we denote by G 2 {'^) C G 2 the subset of graphs in G 2 that are 
consistent with vr. 


• For every t < Q and every query-answer history tt of length t — 1, the process P2 selects a 
graph in Q 2 uniformly at random and answers the query as follows. 

1. If the query is a pair query qt = {u, v), then P 2 answers the query qt according to the 
selected graph. 

2. If the query is a random new-neighbor query qt = Ut, then P 2 ^s answer is a uniform 
new neighbor of ut in the selected graph. 

• After all queries are answered (i.e., after Q queries), uniformly choose a random graph G 
from t/2(7r). 

For a query-answer history tt of length Q we denote by vr-* the length t prefix of vr and by vr-* 
the Q — t + 1 suffix of vr. 

We note that the selected graph is only used to answer the query and is then “discarded 
back to” the remaining graphs that are consistent with that answer (and all previous answers in 
tt). 

Claim 16. Let tt he a query-answer history of length t — 1. We use o to denote concatenation. 

• If the query is a pair query, then at = 1 with probability 

|g2(7ro {qt, 1))| 

\Q2{^)\ ’ 

and at = 0 with probability 

\Q2{t^o {qt,o))\ 

102 (vr) I 


• If the t^^ query is a random new-neighbor query qt = ut, then for every v ^ V \ F^"(m) the 
probability that the process P 2 answers at = v is 

|02(vro (gt,u))| _ 1 

| 02 (vr)| ^/rn - d^'^{ut)' 


If V ^ r^'^{u) then the probability that P 2 answers at = v is 0. 


Proof. First consider a pair query qt = {ut,vt). The probability that {ut, vt) is an edge in the graph 
chosen by the process P 2 is the fraction of graphs in 02(vr) in which (ut,vt) is an edge. This is 
exactly Similarly, the probability of choosing a graph in which {ut,vt) is not an edge 

|g2(7ro(gt,0))| 

102 (tt) I 


IS 


Now consider a random new-neighbor query qt = ut. We start with the case that v ^ V \ F^”. 
The probability that v is chosen by P 2 is the probability that a graph G in which u is a neighbor 
of Ut is chosen in the first step, and that v is the chosen new neighbor among all of u’s neighbors 
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in the second step. Since there are \Q 2 {t^ o {u, t;))| graphs in which u is a neighbor of ut, and ut has 
y/m — d^{ut) neighbors, this happens with probability 

|g2(7r o (-»,f))| 1 

1 ^ 2 ! ^/m - d^'^iut)' 

For a vertex v such that v ^ V \ F^", in every graph G £ Q 2 ^ v is not a neighbor of ut, implying 
that the probability that the process replies at = v is 0. □ 

Lemma 17. For every algorithm ALG, the process P 2 , when interacting with ALG, answers ALG’s 
queries according to a uniformly generated graph G in Q 2 - 

Proof. Consider a specific graph G £ Q 2 . Let vr be the query-answer history generated by the 
interaction between ALG and P 2 . Let Q be the number of queries performed during the interaction. 
The probability that G is the resulting graph from that interaction is 

Pr[G G G 2 {t^^^)] • Pr[G G G 2 {t^^^) \ G £ G 2 {t^-^)] • • • • • Pr[G G G 2 {t^-'^)\G £ G 2 {^-^-^)] • 

_ |02(vr-^)| |^2(vr-^)| \G2{t^-^)\ 1 _ 1 

16^21 ■ |6^2(vr^l)| ■ ■■■ ■ \G2{t^^^-^)\ ■ \G2{t^^)\ ~ 1021’ 

and the lemma follows. □ 


For a fixed algorithm ALG that performs Q queries, and for b £ {1,2}, let P\i^q denote the 
distribution on query-answers histories of length Q induced by the interaction between ALG and 

3/2 

Ph- We shall show that for every algorithm ALG that performs at most Q = queries, the 
statistical distance between and , denoted d is at most This will 

imply that the lower bound stated in Theorem 15 holds for the case that t{G) = m. In order to 
obtain this bound we introduce the notion of a query-answer witness pair, defined next. 


Definition 6. We say that ALG has detected a query-answer witness pair in three cases: 


1. If qt is a pair query for a crossing pair {ut,vt) £ L x R and o* = 0. 

2. If qt is a pair query for a non-crossing pair {ut,vt) £ (L x L)[J {R x R) and o* = 1. 

3. If qt = Ut is a random new-neighbor query and at = v for some v such that {ut,v) is a 
non-crossing pair. 

We note that the source of the difference between P^^*^ and is not only due to the 

probability that the query-answer history contains a witness pair (which is 0 under and 

non-0 under P^^^). There is also a difference in the distribution over answers to random new 
neighbor queries when the answers do not result in witness pairs (in particular when we condition 
on the query-answer history prior to the query). However, the analysis of witness pairs serves 
us also in bounding the contribution to the distance due to random new neighbor queries that do 
not result in a witness pairs. 

Let w he a. “witness function”, such that for a pair query qt on a crossing pair, w{qt) = 0, and 
for a non-crossing pair, w{qt) = 1. The probability that ALG detects a witness pair when qt is a 
pair query {ut,Vt) and vr is a query-answer history of length t — 1, is 


PTp^[w{qt)\TT] 


|02 (tto {qt,w{qt)))\ ^ |02 (vr o {qt,w{qt)))\ 
|02(vr)| ~ \G2{T^o{qt,w{qt)))\ 


Therefore, to bound the probability that the algorithm observes a witness pair it is sufficient to 
bound the ratio between the number of graphs in 02 o (?)the number of graphs in 
02(vr o {q,w{qt ))). We do this by introducing an auxiliary graph, which is defined next. 
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4.1.3 The auxiliary graph for t = m 

For every t < Q, every query-answer history vr of length t — 1 for which tt is consistent with Gi (that 
is, no witness pair has yet been detected), and every pair {u,v), we consider a bipartite auxiliary 
graph one side of A^^(u,v) we have a node for every graph in ^2(7r) for which the pair 

(u, v) is a witness pair. We refer to these nodes as witness graphs. On the other side of the auxiliary 
graph, we place a node for every graph in Q 2 {t^) for which the pair is not a witness. We refer 
to these nodes as non-witness graphs. We put an edge in the auxiliary graph between a witness 
graph W and a non-witness graph W if the pair (tt, v) is a crossing (non-crossing) pair and the two 
graphs are identical except that their red (blue) matchings differ on exactly two pairs - (tt, v) and 
one additional pair. In other words, W can be obtained from W by performing a switch operation, 
as defined next. 

Definition 7. We define a switch between pairs in a matching in the following manner. Let (tt, v) 
and {u',v') be two matched pairs in a matehing M. A switch between (tt, t;) and {u',v') means 
removing the edges (tt, u) and {u',v') from M and adding to it the edges {u,v') and {u',v). 

Note that the switch process maintains the cardinality of the matching. We denote by dm (.4^,(«,,;)) 
the minimal degree of any witness graph in by dnm(-4..,r, («,•!;)) the maximal degree of the 

non-witness graphs. See Figure 3 for an illustration. 
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(a) The auxiliary graph with witness nodes on (b) An illustration of two neighbors in the auxiliary graph 
the left and non-witness nodes on the right. for t = m. 


Figure 3 
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Lemma 18. Let t = m and Q = For every t < Q, every query-answer history vr of length 

t — 1 such that IT is consistent with Gi and every pair {u,v), 


dnm (“47r,(u,u)) ^ 2 


dui (“^7r,(tt,i;)) y/rn 


2t 


Proof. Recall that the graphs in Q 2 are as defined in Subsection 4.1.1 and illustrated in Figure 2. 
In the following we consider crossing pairs, as the proof for non-crossing pairs is almost identical. 
Recall that a crossing pair is a pair (tt, v) such that u £ L and v £ R or vise versa. A witness graph 
W with respect to the pair {u,v) is a graph in which {u,v) is a red pair, i.e., {u,v) £ . There 

is an edge from W to every non-witness graph W £ ^ 2 (t) such that M^{W) and M^(W) differ 
exactly on {u, v) and one additional edge. 
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Every red pair {u',v') E {W) creates a potential non-witness graph VE(„/ ,y/) when switched 
with {u,v) (as defined in Definition 7). However, not all of the these non-witness graphs are in 
^2(7r). If u' is a neighbor of v in the knowledge graph i.e., u’ E T^{v), then VE(u/y) is not 
consistent with the knowledge graph, and therefore ^ Q 2 {j^)- This is also the case for a 

pair {u',v') such that u' E Therefore, only pairs {u',v') E such that u' ^ T^^{v) and 

v' ^ T^{u) produce a non-witness graph E 02('^) when switched with {u,v). We refer to 

these pairs as consistent pairs. Since t < ■^, both u and v each have at most neighbors in the 
knowledge graph, implying that out of the ^/m — 1 potential pairs, the number of consistent pairs 
is at least 

1 -4^iu) -1-2-^ 

Therefore, the degree of every witness graph W E («,?;) is at least implying that 

dw{-^TT,{u,v)') — 2V^^' 

In order to prove that dnw{‘^T 7 ,(u,v)) = 1) consider a non-witness graph W. Since IE is a non¬ 
witness graph, the pair (u, v) is not a red pair. This implies that u is matched to some vertex 
v' E R, and v is matched to some vertex u' E L. That is, {u, v'), {v, u') E . By the construction 
of the edges in the auxiliary graph, every neighbor IE of IE can be obtained by a single switch 
between two red pairs in the red matching. The only possibility to switch two pairs in (W) 
and obtain a matching in which (tt, u) is a red pair is to switch the pairs {u,v') and {v,u'). Hence, 
every non-witness graph IE has at most one neighbor. 

We showed that d^(^ 7 r,(«,?;)) > and that dnw{'A.T^,(u,v)) < 1) implying 


dnw {•^TT,{U,V)) ^ 2 


dw{,-^TT ,{u,v)) \/m 


2t 


and the proof is complete. 


□ 


4.1.4 Statistical distance 


For a query-answer history vr of length t — 1 and a query qt, let Ans{7r, qt) denote the set of possible 
answers to the query qt that are consistent with tt. Namely, if qt is a pair query (for a pair that 
does not belong to the knowledge graph then Ans{'K,qt) = {0,1}, and if qt is a random 

new-neighbor query, then Ans{TT,qt) consists of all vertices except those in 

Lemma 19. Let t = m and Q = For every t < Q, every query-answer history vr of length 

t—1 such that TT is consistent with Gi and for every query qt: 


E 

a£Ans(7Vyqt) 


Prpi [a 


TT,qt\ - Prp 2 [a I vr, gt] 



12t 


Proof. We prove the lemma separately for each type of query. 


• We start with a crossing pair query {ut,vt). In this case the witnesses are red pairs. Namely, 
our witness graphs for this case are all the graphs in 0)), and the non-witness graphs 

are all the graphs in G 2 i'^ ° (Qt, !))• By the construction of the auxiliary graph 


\ G2 (7ro(gi,0))| („,„)) < 1^2 (vro(gi,l))| ■ dnw { A ^,{ u , v ))- 

This, together with Lemma 18, implies 

1^2(71-0 (gp 0 ))| ^ 1^2(770 (gt, 0 ))| ^ dnw ^ AT ^^(^ u , v )) 2 2 t 

1^2(77)1 “ 1^2(770 (gt,l))I - dt„(A,(n,D) 'An ' 
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For a pair query qt, the set of possible answers Ans{'K,qt) is {0,1}. Therefore, 


^ |Prp, [a I vr, qt] - Prpj [a \ vr, qt] 
ae{0,l} 

Prpi [0 ITT, qt] - Prp 2 [0 | vr, gt] + Prp, [11 tt, gt] - Prp^ [11 vr, qt] 

2t f 2t \ At 4 

+ 1 - 1 - 


m' 


3/2 


m 


3/2 


m 


3/2 


m 


( 13 ) 


• For a non-crossing pair query qt = {u, v) our witness graphs are graphs that contain qt as a 
blue pair, i.e., graphs from {Qt, 1))) and our non-witness graphs are graphs in which no 
blue pair had been queried, i.e., graphs from G 2 {'^, {Qt,0)). From Lemma 18 we get that for 
a non-crossing pair query gp 

1^2 (tto (g^,l))| ^ 1^2 (7ro(gf,l))| ^ dnw{A..j^^(^u^y'j') 2t 2 

|^2(7r)| “ 1^2 (tTO (gt,0))| “ 

Therefore, 


^ I PrPi [a I P, %] - Prp 2 [a I TT, qt] 
a£{0,l} 


Prpi [0 I vr, gt] - Pr p^[0 ] tt , qt] 


+ 


PrPi [1 


7r,qt] - Ptp^[1 I TT,qt] 


= 1 - 



2t 

77),3/2 


At 



(14) 


• For a new-neighbor query qt = ut, the set of possible answers Ans{'K, qt) is the set of all the 
vertices in the graph. Therefore, 


^ Prp, [a I vr, qt] - Ptp^ [a ] vr, qt] 

aGAns{7r,qt) 

= PrPi [v I vr, qt] - Prp^ [v ] vr, qt] + ^ Prp^ 


veR 


vGL 


7r,gt] - Prp 2 [u | Tr,qt] 


Recall that for a vertex v € F^"'(u), Prpj [u | vr, qt] = Prp 2 [u | vr, qt] = 0. Therefore, it suffices to 
consider only vertices v such that v ^ P^^{u). Assume without loss of generality that u & L, 
and consider a vertex v ^ R,v ^ P^"'{u). Since for every v £ R we have that {ut, v) € E{Gi), 
by the definition of Pi, 


Prpi [u I vr, qt] 


1 

y/m - dl^^{ut) 


(15) 


Now consider the process P 2 . By its definition, 


Prp 2 [u I TT, qt] 


Q 2 (P o {Qt,v)) 1 

^2(p) ^Jm-d^{u) 

G 2 (po ((^,-;;),l)) _ 1 

^2(vr) ^Jm-d^^{u) 

L _ G 2 (tto ((u,u),0)) \ _ 1 

V ^2(vr) J y/rn-d’^^{u) ' 
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By the first item in the proof, for any crossing pair qt = {u,v), 

^ 2 ( 71-0 (gt,0)) _ 4f 4 


G2{tt) 


m 


3/2 


m 


and it follows that 


Prp,[r;|7r,gi]= ( 1 -^) 


/m — d^^{u) 

By Equations (15) and (16), we get that for every v ^ R such that v ^ r^”'(rt), 


Frp^[v\7T,qt] -PTp2[v\7T,qt] 


m — d^^{u) 


( 16 ) 


(17) 


Therefore, 

^ Prpj [v I vr, qt] - Prp^ [v \ tt, qt] 


veR 


= (V^-dt'^iu 


-- ^ Prpju|7r,gi]-Prp2[u|7r,gt] 

v£R,v^r^’^(u) 

4f 4 


m — d^^{u) m3/2 y/m 


(18) 


Now consider a vertex u G L. Observe that for every u G L, it holds that v ^ r^"(ri) since 
otherwise tt is not consistent with Gi. For the same reason, 


As for P 2 , as before, 


Prp 2 [v I TT, qt] = 


Prpi [v \7r,qt]=0 . 
a 2 (vr, {ut,v)) 


(19) 


1 


^2(vr) d^^{ut) ' 

By the second item of the claim, since for every u G T, {ut,v) is a non-crossing pair, we have 
that 

|^ 2 (vr, (Mt,u))| _ 4t _ 4 


1^2(77)1 772,3/2 

Combining Equations (19) and (20) we get that for every v £ L 

4t/?72^/^ 


Prpju I TT, gi] -Prp2[2;|7r,gi] 


m — d^^{u) 


Since Q = for every t < Q, d!^{u) < and it follows that 

bounded by 2. Hence, 

y^ I r I n r I n / / \ /TTl^^ 

2 ^ |Prp, [u I vr, qt] - Prp 2 [v \ vr, qt] = - 1) ■ 


m—1 




IS 


vGL 


8t 


7723/2 yTn 

By Equations (18) and (21) we get 

2]] Prpi [u I vr, qt] - Picp^ [v \ vr, qt] + 2^ \P^Pi [v I tt, qt] - Prpj [v \ vr, qt] 


( 21 ) 


vGR 


vGL 


12t 12 


772 


3/2 


772 


( 22 ) 


26 


































This completes the proof. □ 

Recall that b G {1,2}, denotes the distribution on query-answer histories of length Q, 

induced by the interaction of ALG and Pb- We show that the two distributions are indistinguishable 
for Q that is sufficiently small. 

3/2 

Lemma 20. Let t = m. For every algorithm ALG that asks at most Q = queries, the 

statistical distance between and is at most 1. 

Proof. Consider the following hybrid distribution. Let be the distribution over query-answer 

histories of length Q, where in the length t prefix ALG is answered by the process Pi and in the 
length Q — t suffix ALG is answered by the process P 2 . Observe that and that 

Pfj^^ = P 2 ^^. Let TT = (tti,7 r2,...,TT^) denote a query-answer history of length £. By the triangle 

inequality ■ 

t=o 


^(pALG^pALG) < ^ 
t=0 


d{Pt 


ALG -nALG\ 


It thus remains to bound d{Pf ^^^, Pf^^) = M [vr] 

0 < t < <5 — 1. Let Q denote the set of all possible queries. 


for every t such that 


M “ p%ALG[7r] 


= Pi'^’iALgKi, . . . , TTt-l] ■ P^ALg[ 9 I TTl, . . . , TTt. 

7 ri,..., 7 rt_i q£Q 

E 

aeAns(( 7 ri,..., 7 rt_i), 5 ) 

Pl'P 2 ,ALGkt+l, . . . jTTq I TTl, . . . ,TTt-l, (?,«)] • 

7 rt+i,..., 7 rQ 


-1 


Prpi [a\7Ti,... ,7Tt-i,q]- Prp^ [o | tti ,..., vr^.i, 


12t 


m' 


3/2 ■ 


By Lemma 19, for every 1 < t < Q — 1, and every tti, ..., irt-i and q, 

Prpja|7ri,...,7rt_i,q']-Prp2[a|7ri,...,7rt_i,gf] < 

aeAns{{TTi,...,TTt-i),q) 

We also have that for every pair {q,a), 

PrP2,ALG[7rt+i,... ,7rQ I TTi,... ,TTt-i, (o',a)] = 1 • 

7rt+i,...,7rQ 

Therefore, 

X] PrpALcjTr] - Pr^pALG^ < Prp^^ALGKi,..., Pt-i] y]] PrALG[9 I tti, ..., 7rt_i] • 


12t 12t 


Hence, for Q = 


7ri,...,7rt_i 


Q-1 


q&Q 


rrpp 777 , 3/2 


^(pALG J,ALG) = PrpALcjTr] - Pr77ALG[7r] 


TT 7=1 


1 12t 1 

“ 2 mAP ~ 3’ 


and the proof is complete. 


□ 
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In the next subsection we turn to prove the theorem for the cases where m < t < 


. 3/2 


and for 

the case where ^/m < t < ^. We start with the former case. The proof will follow the building 
blocks of the proof for t = m, where the only difference is in the description of the auxiliary graph 


■^iT,(u,v) and in the proof that 


^ {'^'K ,(u,v) ) 


2t 


2r 

y/m' 


4.2 A lower bound for m < t < 

Let t = r - m for an integer r such that 1 < r < It is sufficient for our needs to consider only 

values of t for which r is an integer. The proof of the lower bound for this case is a fairly simple 
extension of the proof for the case of f = m, that is, r = 1. We next describe the modifications we 
make in the construction of Q 2 - 


4.2.1 The lower-bound construction 

Let Gi be as defined in Subsection 4.1.1. The construction of Q 2 for t = r ■ m can be thought 
of as repeating the construction of Q 2 for t = m (as described in Subsection 4.1.1) r times. We 
again start with a complete bipartite graph and an independent set of size n — 2^/rn. For 

each graph G G Q 2 we select r perfect matchings between the two sides R and L and remove these 
edges from the graph. We denote the r perfect matchings by ,..., and refer to them as the 

red matchings. We require that each two perfect matchings and do not have any shared 

edges. That is, for every i and for every j, for every {u,v) G it holds that {u,v) ^ ■ fo 

order to maintain the degrees of the vertices, we next select r perfect matchings for each side of the 
bipartite graph (L to L and Rio R). We denote these matchings by M ^,..., and ..., 
respectively. Again we require that no two matchings share an edge. We refer to these matchings 
as the blue matchings and their edges as blue pairs. Each such choice of 3r matchings defines a 
graph in Q 2 . 

Let G be a graph in Q 2 . We say that a triangle is blue if all its edges are blue. Otherwise 
we say the triangle is mixed. Observe that every blue edge in G participates in at least ^/m — 2r 
mixed triangles, and at most r blue triangles. Also note that every two mixed triangles are disjoint. 
Therefore, there are at least {2y/m — 2r) = Q,{r-m) and at most i2ry/m-{2^/m — 2r) + r‘^^/m 

triangles in G. Since r < we get that every graph in G has 0(r • m) triangles. 

4.2.2 The processes Pi and P 2 

The definition of the processes Pi and P 2 is the same as in Subsection 4.1.2 (using the modified 
definition of Q 2 ), and Lemma 17 holds here as well. 

4.2.3 The auxiliary graph 

As before, for every t < Q, every query-answer history vr of length t — 1 such that vr is consistent with 
Gi and every pair {u,v), we define a bipartite auxiliary graph such that on one side there 

is a node for every witness graph W G G 2 {'^), and on the other side a node for every non-witness 
graph W G G 2 {'^)- The witness graphs for this case are graphs in which {u,v) is a red (blue) edge 
in one of the red (blue) matchings. If {u, v) is a crossing pair, then for every witness graph W, 
(u, v) G mG (if) for some 1 < i < r. If (it, v) is a non-crossing pair, then for every witness graph 
IF, {u,v) G M^{W) or {u,v) G M^{W). There is an edge from IF to every graph IF such that 
the matching that contains (rt, v) in IF and the corresponding matching in IF differ on exactly two 
pairs - {u,v) and one additional pair. For example, if (n, u) G there is an edge from IF 

to every graph IF such that Mf{W) and Mf (W) differ on exactly {u,v) and one additional pair. 
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Lemma 21. Let t = r ■ m for an integer r such that 1 < r < and let Q = For every 

t < Q, every query-answer history vr of length t — 1 such that vr is consistent with Gi and every 
pair {u, v), 

dnw 

{^7r,(ii,t;)) vrtfG \/m 

Proof. We again analyze the case in which the pair is a crossing pair {u,v), as the proof for a 
non-crossing pair is almost identical. We first consider the minimal degree of the witness graphs in 
Let MF be the matching to which {u,v) belongs. As before, only pairs {u',v') G Mf such 
that u' ^ v' ^ result in a non-witness graph W G Q 2 {'^) when switched with (u,v). 

However, we have an additional constraint. Since by our construction no two red matchings share 
an edge, it must be that u' is not matched to v in any of the other r red matching, and similarly 
that u is not matched to v' in any of the other matchings. It follows that of the [y/m — l — 2 • ^ 

potential pairs (as in the proof of Lemma 18), we discard 2r additional pairs. Since 1 < r < 

we remain with {y/m — 1 — — \^/rn) > \\/m potential pairs. Thus, dwiAT^^(^u^y'j) > \\fm. 

We now turn to consider the degree of the non-witness graphs and prove that dnw{AT^^(^u,v)) < f'- 
Consider a non-witness graph W. To prove that W has at most r neighbors it is easier to consider 
all the possible options to “turn” W from a non-witness graph into a witness graph. It holds that 
for every j G [r], (u, v) ^ MC(IT). Therefore for every matching AfC, u is matched to some vertex, 
denoted u'- and v is matched to some vertex, denoted u'-. If we switch between the pairs (u, u' ) 
and (u,n' ), this results in a matching in which (u, u) is a witness pair. We again refer the reader 
to Figure 3b, where the illustrated matching can be thought of as the matching. Denote the 
resulting graph by y/\- If the pair (u'-,u') has not been observed yet by the algorithm then 

''3^3' J J 

is a witness graph in Therefore there are at most r options to turn IT into a 

witness graph, and dniu(-4.7r,(u,t))) < We showed that du;(wT,r,(«,•!;)) > and dn«)(-47r,(ti,t;)) < 

implying 

dnw {ATr,(u,v)) ^ 2 r _ 2t 
^w(A.jy,(u,v)) rnfA 

as required. □ 

4.2.4 Statistical distance 

The proof of the next lemma is exactly the same as the proof of Lemma 19, except that occurrences 
of the term (t/m^^^) are replaced by {rfy/m) instead of (l/y/m), and we apply Lemma 21 instead 
of Lemma 18. 

Lemma 22. Let t = r ■ m for an integer r such that 1 < r < and let Q = For every 

t <Q, every query-answer history vr of length t — 1 such that n is consistent with Gi and for every 
query qt, 

E 

a^AnsifK ^qt) 

The proof of the next lemma is same as the proof of Lemma 20 except that we replace the 
application of Lemma 19, by an application of Lemma 22. 

Lemma 23. Let t = r ■ m for an integer r such that 1 < r < For every algorithm ALG that 
performs at most Q = queries, the statistical distance between and is at most |. 


Prpi [a ITT, Qt] - Prp2 [a \ vr, qt] 


izt 

m3/2 


LZr 
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4.3 A lower bound for <t< jm 


Similarly to the previous section, we let t = k\/m and assume that k is an integer such that 

l<k<^. 


4.3.1 The lower-bound construction 


The construction of the graph Gi is as defined in Subsection 4.1.1, and we modify the construction of 
the graphs in ^ 2 - As before, the basic structure of every graph is a complete bipartite graph 
and an independent set of size n — 2y/m vertices. In this case, for each graph in Q 2 , we do not remove 
a perfect matching from the bipartite graph, but rather a matching of size k. In order to keep 
the degrees of all vertices to be ^/m, we modify the way we construct the blue matchings. Let = 

{(fji, ^ii), (^* 2 , ^* 2 ), • • •, (^4) ^ 4 )} be the crossing matching. The blue matchings will be = 

{(f4>^4)> (^ 41 ^ 4)1 • • •) (^4-i>'^4)} andM-f^ = Note that every 

matched pair belongs to a four-tuple such that {hj,ri.) and ) are red 

pairs and ) and are blue pairs. We refer to these structures as matched squares 

and to four-tuples {lx,^y,'fz,rw) such that no pair in the tuple is matched as unmatched squares. 
See Figure 4 for an illustration. Every graph in G 2 is defined by its set of k four-tuples. 

Similarly to previous constructions, in every graph G £ Q 21 every blue edge participates in 
y/m — 2 triangles. Since every triangle in the G contains exactly one blue edge, we have that G has 
k ■ {y/m — 2) = Q{ky/m) triangles. 



Figure 4: An illustration of the bipartite component in the family O 2 for y/m <t< \m. 


4.3.2 The processes Pi and P 2 

We introduce a small modification to the definition of the processes Pi and P 2 . Namely, we leave 
the answering process for pair queries as described in Subsection 4.1.2 and modify the answering 
process for random new-neighbor queries as follows. Let t < Q, and vr be a query-answer history of 
length t — 1 such that vr is consistent with Gi. If the query is a new-neighbor query qt = u and 
d^^{u) < ^y/m, then the processes Pi and P 2 answer as described in Subsection 4.1.2. However, if 
the query is a new-neighbor query qt = u such that d^{u) > \y/m, then the processes answers 
as follows. 

• The process Pi answers with the set of all neighbors of tt in Gi. That is, if u is in P, then the 
process replies with a = R = {ri,... and if u is in R, then the process replies with 

a = L = {ii,... ,i^}. 
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The process P 2 answers with a = {ui,..., where {fi,..., v^} is the set of neighbors 

of tt in a subset of the graphs in ^ 2 - By the definition of Q 2 , if is in L, then this set is either 
R, or it is i? \ {rj} U {ij} for some ri G R and ij G L, and if u is in R, then this set is either 
L, or it is L \ {£j} U {vj} for some ii G L and rj G R. For every such set a G Ans{Tr, qt), the 
process returns a as an answer with probability 

|g2(7ro {qt,a))\ 

l^2(7r)| 


We call this query an all-neighbors query. 

First note that the above modification makes the algorithm “more powerful”. That is, every 
algorithm that is not allowed all-neighbors query can be emulated by an algorithm that is allowed 
this type of query. Therefore this only strengthen our lower bound results. 

Also note that this modification does not affect the correctness of Lemma 17. We can redefine 
the function at(7r) to be 




'1 

< 1/{y/m-dl%_i{u)) 

1 

\ 


if qt{TT) is a pair query 

if qti'n') = tt is a random new-neighbor query , 
if (?t(vr) is an all-neighbors query 


and the rest of the proof follows as before. 


4.3.3 The auxiliary graph 

For every t < Q, every query-answer history tt of length t — 1 such that vr is consistent with Gi 
and every pair {u,v), the witness graphs in graphs in which {u,v) is either a red pair 

or a blue pair. There is an edge between a witness graph W and a non-witness graph W if the two 
graphs have the same set of four-tuples except for two matched squares - one that contains the 
pair {u,v), {u,v,u',v') and another one. 

Definition 8. We define a switch between a matched square and an unmatched square in the fol¬ 
lowing manner. Let {u,v,u',v') be a matehed square and {x,y,x',y') be an un matched squares. 
Informally, a switch between the squares is “unmatching” the matched square and instead “match¬ 
ing” the unmatched square. 

Formally, a switch consists of two steps. The first step is removing the edges {u,v) and {u',v') 
from the red matching and the edges {u,u') and {v,v') from the blue matchings and 
respectively. The second step is adding the edges {x,y) and {x',y') from the red matching and 
the edges {x, x') and {y, y') from the blue matchings and respectively. See Figure 5 for an 
illustration. 


j — 2 /2 

Lemma 24. Let t = k ■ y/m for an integer k such that 1 < k < and let Q = For every 

t < Q, every query-answer history vr of length t — 1 such that vr is consistent with Gi and every 
pair {u, v), 

dnw _ 16k _ 16t 

dw{AT,^(u,v)) 'm 

Proof. We start with proving that dui(-47r,(M,D)) > A witness graph in with respect 

to a pair (u, u) is a graph in which {u,v) is part of a matched square {u,v,u',v'). Potentially, 
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Figure 5: An illustration of a switch between the squares {u,v,u',v') and {x,y,x',y'). 

{u,v,u',v') could be switched with every unmatched square to get a non-witness pair. There are 
— k unmatched vertices on each side, so that there are potential 

squares. To get a graph that is in &(vr), the unmatched square {x,y,x',y') must be such that 
none of the induced pairs between the vertices x, x', y, y' have been observed yet by the algorithm. 
When all-neighbor queries are allowed, if at most Q queries has been performed, then at most AQ 
pairs have been observed by the algorithm. Therefore, for at most of the potential 

squares, an induced pair was queried. Hence, every witness square can be switched with at least 
> -^rn"^ consistent unmatched squares, implying that duj{AT^^^u,v}) ^ 

To complete the proof it remains to show that < mk. To this end we would like 

to analyze the number of witness graphs that every non-witness W can be “turned” into. In every 
non-witness graph W the pair (n, v) is unmatched, and in order to turn W into a witness graph, 
one of the k matched squares should be removed and the pair [u, v) with an additional pair {u',v') 
should be “matched”. There are k options to remove an existing square, and at most m options to 
choose a pair u',v' to match {u,v) with. Therefore, the number of potential neighbors of W is at 
most mk. It follows that 

dnw{ATr^(^u,v)) 16mk 16k 16t 

A,(«,!.)) m m3/2’ 

and the proof is complete. □ 

4.3.4 Statistical distance 

For an all-neighbors query q = u we say that the corresponding answer is a witness answer if u G L 
and a 7 ^ i?, or symmetrically if u £ R and a L. Let be the set of all query-answer histories vr 
of length Q such that there exists a query-answer pair (q, a) in tt in which q is an all-neighbors pair 
and a is a witness answer with respect to that query, and let = H*^ \ E^. That is, E^ is the set 
of all query-answer histories of length Q such that no all-neighbors query is answered with a witness 
answer. Let Pi and P 2 by the induced distributions of the processes Pi and P 2 conditioned on the 
event that the process do not reply with a witness answer. Observe that for every query-answer 
history tt of length t — 1, for every query qt that is either a pair query or a random new-neighbor 
query and for every a G Ans{TT, qt), 

Prp^ [a\TT,qt]= Prp^, [a | vr, qt]. 

for b G {1, 2}. Therefore, the proof of the next lemma is exactly the same as the proof of Lemma 19, 
except that occurrences of the term are replaced by {k/m) instead of {l/^/m) and we apply 

Lemma 24 instead of Lemma 18. 
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Lemma 25. Let t = k ■ y/m for an integer k such that 1 < k < and let Q = For every 

t <Q, every query-answer history vr of length t — 1 such that n is consistent with Gi and for every 
pair or random new-neighbors query qt, 

Prj;Ja| 7r,gt] - Pr^Ja | vr, 

a^Ans{'K ^qt) 



Qt] 


96k 

m 


96t 

3/2 


171' 


Note that Lemma 25 does not cover all-neighbors queries, and hence we establish the next 
lemma. 


/— 3/2 

Lemma 26. Let t = k ■ y/m for an integer k such that 1 < k < and let Q = For every 

t <Q, every query-answer history vr of length t — 1 such that n is consistent with Gi and for every 
all-neighbors query qt, 


Prpjfat is a witness answer 


Tt, qt] < 


16A: 



Proof. Assume without loss of generality that u £ L. By the definition of the process P 2 , it 
answers the query consistently with a uniformly selected random graph G 2 E G 2 {Tt) by returning 
the complete set of tt’s neighbors in G 2 . In G 2 , there are two types of graphs. First, there are graphs 
in which u is not matched, that is {u, u') ^ for every vertex u' G L. In these graphs the set 
of m’s neighbors is R ={r'i,..., We refer to these graphs as non-witness graphs. The second 

type of graphs are those in which [u, u') G for some u' £ L and {u, v) G for some u G i?. 
In these graphs the set of u’s neighbors is {R \ {u}) U {u'}. We refer to these graphs as witness 
graphs. As before, let Ans{7r,qt) be the set of all possible answers for an all-neighbors query qt. It 
holds that 


Prp 2 [at is a witness answer | vr, = E Prp 2 [a ]7r, qt] 

It) 

]G 2 {tt o {{u,u'),l) o ((u,r;),0))| 


aEAns(7r,gt) 
a^R 


E 

u'eL,veR 


E 

u' 

E 

u'^L 


l^2(vr)| 

]G2 {'ko{{u,u’), 1))| ]G2 (vro ((n,n'), 1) o ((u,u),0))| 


l^2(vr)| 

]G 2 (vro ((k,k0,1))| 

l^2(vr)| 


E 

v£R 


l^2(vr)| 


Similarly to the proof of Lemma 19, for every u and u' in L, Therefore, 

r • 11 1^2 (vro ((u,w'),l))| ^ 16A: 

Prp2[ai IS a witness answer | vr, qt\ = y - , ^ , - < y/m ■ 


u'eL 


l^2(vr)| 


m y/m 


and the lemma follows. □ 

It remains to prove that a similar lemma to Lemma 20 holds for y/m < t < (and the 
distributions and S'® defined in this subsection). 

Lemma 27. Let t = k - y/m for an integer k such that 1 < k < For every algorithm ALG that 
performs at most Q = queries, the statistical distance between and is at most |. 


33 
















Proof. Let the sets and e'^ be as defined in the beginning of this subsection. By the definition 
of the statistical distance, and since = 0) 


^ |PrPi,ALG[vr] - Prp2,ALGN + ^ |Prpi,ALGK] - Prp2,ALG[7r] 


t&EQ 


ttGE^ 


-E { Prp2,ALG[^‘^] + ^ |Prp^,ALGM “ P^Pa.ALGM 


(23) 


res" 


By Lemma 26, the probability of detecting a witness as a result of an all-neighbors query is at most 
Since in Q queries, there can be at most AQj^pm all-neighbors queries, we have that 


Pr^pALG < - . 


(24) 


We now turn to upper bound the second term. Let a = Prp 2 ^ALG[-®*^]- 


Prp^,ALGN -Prp2,ALGK] 

= E P’^PiALgM • PrPi,ALG[i?'^] - Prp^ alG^ • PrP2,ALG[P^'^] 

ttGE^ 




= 'Y |P’^Pi,ALgM ~ ■ ^^P2,ALg[’’’] 

(25) 




< 


|^’^Pi,ALgH ^’^P2,ALgM 

■kEeP 


-|- a • Pr 




P2,ALG 


[E 


< 


|^’^Pi,ALg[^] ^’^P2,ALg[^] 

ttGE^ 



(26) 


where in Equation (25) we used the fact that Prpj^ALGfP'^] = Ij and in Equation (26) we used the 

fact that Prp^ algI'^^I ~ ^ a < 1/6. 

Therefore, it remains to bound 

|^^Pi,algM “ ^’^P2,algM • 

Let the hybrid distributions for t G [Q — 1] be as defined in Lemma 20 (based on the 

distributions and that are induced by the processes Pi and P 2 that were defined in 

this subsection). Also, let Ei]^'^ be the hybrid distribution conditioned on the event that no 

all-neighbors query is answered with a witness. That is, is the distribution over query-answer 

histories vr of length Q, where in the length t prefix ALG is answered by the process Pi, in the 
length Q — t suffix ALG is answered by the process P 2 , and each all-neighbors query is answered 
consistently with Gi (so that no witness is observed). By the above definitions and the triangle 
inequality. 


|^’^Pi,algM ^’^P2,alg['^] 

ttGE^ 


Q-1 

|PrpALGjvr]-PrpALcM 

* nGE^ 


(27) 
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As in the proof of Lemma 20 we have that for every t & [Q — 1] 


ttGE'^ 


Y1 P’'Pi,ALG[’^'’9i] 

7r'=7ri,...,7rt_i,(Jt: 

tt'sE 


E 

a£Ans{7r\qt): 

7v'o{qt,a)£E^ 


Pip^ [a I vr', qt] - Prp^ [a | tt', qt] 


(28) 


By Lemma 25 (and since for an all-neighbor query q^ we have that the (unique) answer according 
to P 2 is the same as according to Pi), 

PrpJa|P,%]-PrjsJalP, 

a&Ans{iT' ,qt): 

TT'o{qt,a)&E^ 


Qt] 


< 


96k 


96t 


m 


m 


3/2 ’ 


and it follows that 


Hence, for Q = 



Q-i 


E E 


* tGE^ 




M — Pr, 


pALG FJ 


48t 1 

— ~ 6 


Combining Equations (23), (24), (26), (27) and (29), we get 


^(pALG^pALG) < 


1/1 1 1 
2 V6 6 6 


“ 3’ 


(29) 


(30) 


and the proof is complete. 


□ 


4.4 Lower Bound for t < \^/m. 

4.4.1 The construction 

In this case the basic structure of Gi and Q 2 is a bit different. Also, for the sake of simplicity, we 

present graphs with 2m edges, and either 0 or 4t triangles. The graph Gq has three components - 

two complete bipartite graphs, each over 2^/m vertices, and an independent set of size n — dyAu. 

Let A and B be the left-hand side and the right-hand side sets, respectively, of the first bipartite 

component, and G and D of the second one. We refer to the edges between A and B and the edges 

between G and D as black edges. We divide each of these sets into subsets of size t, denoted 

{Ai,..., A^} for A E {A,B,G,D}. For every 1 < i we first remove a complete bipartite 

~r~ 

graph between A* and Pj and between C* and Di, and refer to the removed edges as red edges. 
We then add a complete bipartite graph between Pj and Ci and between Pj and Ai, and refer to 
added edges as blue edges. Note that this maintains the degrees of all the vertices to be 

In Q 2 the basic structure of all the graphs is the same as of Gi with the following modifications. 
Each graph is defined by the choice of four “special” vertices a* ,b* ,c*, d* such that a* E , b* E 
Bi^,, c* E and d* E for some indices ia*, %* ■, A* and id* such that no two indices are equal. 
We then add edges (a*,c*) and {b*,d*), referred to as green edges, and remove edges {a*,b*) and 
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A 


B 


C 


D 




Figure 6: An illustration of a graph in ^ 2 - The broken thin (red) edges describe edges that were 
removed and the thin (blue) edges describe edges that were added. The broken thick (purple) edges 
describe the special non-edges {a*,b*) and {c*,d*). The curly (green) edges describe the special 
edges (a*,c*) and {b*,d*). 


{c*,d*), referred to as purple edges. We also refer to the green and purple edges as special edges. 
Note that we add one edge and remove one edge from each special vertex, thus maintaining their 
initial degrees. See Figure 6. 

We first prove that t{Gi) = 0 and then that for every graph G in ^ 2 , t{G) = 4t. 

Claim 28. The graph Gi has no triangles. 

Proof. Consider an edge (u, v) in Gi. First assume u and v are connected by a black edge, that 
is, they are on different sides of the same bipartite component. Hence we can assume without loss 
of generality that u £ A and that v £ B. Since tt is in A it is only connected to vertices in B or 
vertices in D. Since u is in H it is only connected to vertices in A or vertices in G. Thus u and v 
cannot have a common neighbor. A similar analysis can be done for a pair (u, v) that is connected 
by a blue edge. Therefore t{G) is indeed zero as claimed. □ 

Claim 29. For every graph G £ Q 2 , t{G) = 4f. 

Proof. Since the only differences between Gi and graphs in Q 2 are the two added green edges and 
the two removed red edges, any triangle in Q 2 must include a green edge. Therefore we can count 
all the triangles that the green edges form. Consider the green edge (a*,c*) and recall that a* is 
in and c* is in . The only common neighbors of (a*, c*) are all the vertices in Bi^^ and all 
the vertices in . A vertex v such that v ^ Bi^, and v ^ is either (1) in A or in D \ , in 

which case it is not a neighbor of a*, or it is (2) in G or in B\Bi^^ , in which case it is not a neighbor 
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of c*. Since both and are of size t, the edge {a*, c*) participates in 2t triangles. Similarly 
the edge {b*,d*) participate in 2t triangles, and together we get that t{G) = 4t, as claimed. □ 

4.4.2 The processes Pi and P 2 

The definition of the processes Pi and P 2 is the same as in Subsection 4.3.2 (using the modified 
definitions of Gi and ^ 2 )- 

4.4.3 The auxiliary graph 

We define a switch for this case as well. Informally, a switch between a matched pair {u*,v*) and 
an unmatched pair {u,v) is “unmatching” (u*,v*) and “matching” {u,v) instead. Formally stating 
we define a switch as follows. 

Definition 9. A switch between a green pair (a*,c*) and a pair (a,c) such that a E Ai, c E Gj 
and none of the indices i,j,ib*,id* are equal, is the following two steps process. In the first step we 
“unmatch” {a*,c*) by removing the green edge (a*,c*) and adding the edges {a*,b*) and {c*,d*). 
In the second step we “match” (a,c) by adding the green edge (a, c) and removing the edges {a,b*) 
and {c,d*). A switch with the pair {b*,d*) can be defined in a similar manner. 




Figure 7: An illustration of a switch between the pairs (a*,c*) and (a, c). 

Let t < y/rn and let Q = For every t < Q, every query-answer history tt of length t—1 and 
every pair (tt, v) we define the following auxiliary graph. The witness nodes are graphs in which 
(tt, v) is one of the four special pairs. If the pair is a green matched pair then there is an edge in 
the auxiliary graph between a witness graph W and a non-witness graph W, if W can be obtained 
from VF by a single switch between (tt, t;) and another unmatched pair. 

Lemma 30. For t < \^/m let Q = For every t < Q, every query-answer history tt of length 
t — 1 such that TT is consistent with Gi and every pair (tt, u), 


dnw {•^TT,{u,v) ) 

(-^TT, ) 


8 

m 


Proof. We analyze the case where the pair (tt, v) is such that u G A and v G G, as the proof for 
the other cases is almost identical. We first prove that du;(Fl,r,(«,i;)) > A witness graph W 
is a graph in which (a,c) is a special pair. That is (tt,u) = (a*,c*). Potentially, for every pair 
(a',c') such that o' E Ai, c' G Gj and none of the indices i,j,ib*,id* are equal, the graph resulting 
from a switch between (a*,c*) and {a',d) is a non-witness graph. There are y/m — 2t vertices o' in 
A \ {Ai^„ U Ai^, ) and for each such o' there are y/m — 2,t vertices c' in C'\ {Gi^„ U U )). Since 
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t < \^/m, there are at least {^/m — 2t) ■ {y/m — 2>t) = m — > \m potential pairs (a', c') that 

(a*,c*) could be switched with. For the resulting graph to be consistent, that is, to be in 02(^)) 
the pair (o', d) must be such that the pairs (o', c'), (a*, b*) and (c*, d*) have not been observed yet 
by the algorithm. Since the number of queries is at most at least \m — -^rn > of the 

potential pairs (a', d) can be switched with (a*, c*) such that the resulting graph is consistent with 
^2(7r). Therefore, > \m. 

Now consider a non-witness graph W. There is only one possibility to turn W into a witness 
graph, which is to switch the pair (u, v) with the green pair (a*, c*). Therefore, the maximal degree 
of every non-witness graph, dnwi~^TT,{u,v))j is 1. 

Together we get that 


dnw 


and the proof is complete. 


□ 


4.4.4 Statistical distance 


A similar proof to the ones of Lemma 25 and Lemma 26 using Lemma 30 gives the following 
lemmas for the case that 1 < t < \y/m. 

Lemma 31. Let 1 < t < \^/m and Q = For every t < Q, every query-answer history vr of 
length t — 1 such that vr is consistent with Gi and for every all-neighbors query qt, 


VipJat is a witness answer I tt, gd < — . 

m 

Lemma 32. Let 1 < t < \^/rn and Q = For every t < Q, every query-answer history vr of 
length t — 1 such that vr is consistent with Gi and for every pair or random new-neighbors query qt, 


E 

a£Ans(7r,qt) 


Prpjol 7r,gt] - Prpjal 7r,gt] 


96 

m 


The next lemma is proven in a similar way to 1.3.4 based on the above two lemma. 

Lemma 33. Let 1 < t < \y/m. For every algorithm ALG that asks at most Q = the statistical 
distance between and is at most 


4.5 Wrapping things up 

Theorem 15 follows from Lemmas 20, 23, 27 and 33, and the next corollary is proved using Theorems 
15 and 14. 

Corollary 34. Any multiplicative-approximation algorithm for the number of triangles in a graph 
must perform ^ + aiin |m, queries, where the allowed queries are degree queries, 

pair queries and neighbor queries. 

Proof. Assume towards a contradiction that there exists an algorithm ALG’ for which the following 
holds: 

1. ALG’ is allowed to ask neighbor queries as well as degree queries and pair queries. 

2. ALG’ asks Q' queries. 
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3. ALG’ outputs a (1 ± e)-approximation to the number of triangles of any graph G with prob¬ 
ability greater than 2/3. 

Using ALG’ we can define an algorithm ALG that is allowed random new-neighbor queries, performs 
at most Q = 3Q' queries and answers correctly with the same probability as ALG’ does. ALG runs 
ALG’ and whenever ALG’ performs a query q'-f-, ALG does as follows: 

• If q[ is a degree query, ALG performs the same query and sets a[ = at- 

• If q'-^ is a pair query {u,v), then ALG performs the same query q = q'. Let at be the 
corresponding answer. 

— If at = 0, then ALG sets a't = at- 

— If at = 1, then ALG sets = {atGG)-: such that i and j are randomly chosen labels 
that have not been previously used for neighbors of u and v, and are within the ranges 
[l..(i(tt)] and [1..^^,] respectively. 

• If q't is a neighbor query {u,i), ALG performs a random new-neighbor query qt = u, and 
returns the same answer a[ = at- 

We note that the above requires the algorithm ALG to store for every vertex v, all the labels used for 
its neighbors in the previous steps. Once ALG’ outputs an answer, ALG outputs the same answer. 
It follows that ALG performs at most 3Q queries to the graph G. By the third assumption above, 
ALG outputs a (1 ± e)-approximation to the number of triangles of any graph G with probability 
greater than 2/3. If Q' ^ ^ ^ ^ I^}) 

which is a contradiction to Theorem 14 and Theorem 15. □ 
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